A Model-based Line Detection Algorithm in Documents

نویسندگان

  • Yefeng Zheng
  • Huiping Li
  • David S. Doermann
چکیده

In this paper we present a novel model based approach to detect severely broken parallel lines in noisy textual documents. It is important to detect and remove these lines so the text can be segmented and recognized. We use Directional Single-Connected Chain, a vectorization based algorithm, to extract the line segments. We then instantiate a parallel line model with three parameters: the skew angle, the vertical line gap, and the vertical translation. A coarse-to-fine approach is used to improve the estimation accuracy. From the model we can incorporate the high level contextual information to enhance detection results even when lines are severely broken. Our experimental results show our method can detect 94% of the lines in our database with 168 noisy Arabic document images.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification

In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...

متن کامل

Sequencing Mixed Model Assembly Line Problem to Minimize Line Stoppages Cost by a Modified Simulated Annealing Algorithm Based on Cloud Theory

This research presents a new application of the cloud theory-based simulated annealing algorithm to solve mixed model assembly line sequencing problems where line stoppage cost is expected to be optimized. This objective is highly significant in mixed model assembly line sequencing problems based on just-in-time production system. Moreover, this type of problem is NP-hard and solving this probl...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Neural-Network-Aided On-line Diagnosis of Broken Bars inInduction Motors

This paper presents a method based on neural networks to detect broken rotor bars and end rings in squirrel cage induction motors. In the first part, detection methods are reviewed and traditional methods of fault detection as well as dynamic model of induction motors are introduced using the winding function method. In this method, all stator and rotor bars are considered independently in ord...

متن کامل

الگوریتم جامعی برای مکان یابی خطا در خطوط انتقال دو مداره و چند پایانه ای (بیش از سه پایانه) مبتنی بر داده های PMU

A new PMU-based fault detection/location algorithm for multi-terminal transmission lines is proposed in this paper, which works on the basis of synchronized voltage and current phasors received from PMUs installed in various terminals. The Clark transform (for transposed transmission lines), Eigen-values and eigenvectors theory (for un-transposed ones) are used to decouple 3-phase differential ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003